Robust speech detection and segmentation for real-time ASR applications
نویسندگان
چکیده
This paper provides a solution for robust speech detection that can be applied across a variety of tasks. The solution is based on an algorithm that performs non-parametric estimation of the background noise spectrum using minimum statistics of the smoothed short-time Fourier transform (STFT). It will be shown that the new algorithm can operate effectively under varying signal-to-noise ratios. Results are reported on two tasks – HMIHY and SPINE, which differ in their speaking style, background noise type and bandwidth. With a computational cost of less than 2% real-time on a 1GHz P-3 machine and a latency of 400ms, it is suitable for real-time ASR applications.
منابع مشابه
Robust Automatic Continuous Speech Segmentation for Indian Languages to Improve Speech to Speech Translation
This paper provides an analysis of phrase and word boundary detection in a background of noise, which occurs in the context of Automatic Recognition System (ASR) and TextTo-Speech (TTS) synthesis systems for Indian languages. ASR and TTS are the major components in Speech To Speech Translation (STST) system. Both are always need a speech signal to be segmented into some basic units like phrases...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملA Robust, Real-time Endpoint Detector with Energy Normalization for Asr in Adverse Environments
When automatic speech recognition (ASR) is applied to hands-free or other adverse acoustic environments, endpoint detection and energy normalization can be crucial to the entire system. In low signal-to-noise (SNR) situations,conventional approaches of endpointing and energy normalization often fail and ASR performances usually degrade dramatically. The goal of this paper is to find a fast, acc...
متن کاملRobust, real-time endpoint detector with energy normalization for ASR in adverse environments
When automatic speech recognition (ASR) is applied to hands-free or other adverse acoustic environments, endpoint detection and energy normalization can be crucial to the entire system. In low signal-to-noise (SNR) situations,conventional approaches of endpointing and energy normalization often fail and ASR performances usually degrade dramatically. The goal of this paper is to find a fast, acc...
متن کاملRobust Potato Color Image Segmentation using Adaptive Fuzzy Inference System
Potato image segmentation is an important part of image-based potato defect detection. This paper presents a robust potato color image segmentation through a combination of a fuzzy rule based system, an image thresholding based on Genetic Algorithm (GA) optimization and morphological operators. The proposed potato color image segmentation is robust against variation of background, distance and ...
متن کامل